An Automated Allele-calling System for High-throughput Microsatellite Genotyping
نویسنده
چکیده
Microsatellite markers are widely used for genetic analysis in biomedical research, agriculture, population and evolutionary biology, as well as for forensics and diagnostics. Advances in laboratory automation and data collection have increased the throughput and have reduced the cost of large-scale genotyping. One step in the measurement process that still needs improvement is “allele calling”, where raw electrophoresis signals are converted into discrete genotypes. This is still largely a laborious manual process that constitutes more than a quarter of the genotyping cost. Automating allele calling is hampered by, among others, the presence of “stutter patterns” (artefact peaks introduced during PCR amplification) and variation in electrophoresis migration behavior. Both effects are marker specific, making it difficult to devise an algorithm that works for all markers without marker-specific calibration. This thesis proposes an allele calling method that consists of two main computer programs: (1) STRAL: a trace alignment algorithm that normalizes variation in the “time domain” of the observed chromatograms, and (2) FA: a pattern recognition algorithm that performs allele calling on the aligned chromatograms. Both are adaptive and do not require marker-specific calibration. For a given observation, each possible genotype is associated with a quality score related to the probability of calling error. This quality score can be used to rank and select the most likely genotype(s). Benchmark tests were performed on ∼33,000 genotypes taken arbitrarily from the daily output of a genotyping service laboratory. The performance is characterized by a trade-off between true calls and miscalls at a given cutoff of the quality value. At a level corresponding to less than 1% error (acceptable for most purposes), 55% of the data can be called correctly (or 70% of the data that can be called by human analysts). This performance is still far from manual calling (at 80% correct call of the total with < 0.2% error). However, it is useful for a hybrid system where up to 70% of the data is scored automatically, 15% of which is automatically rejected, and the remaining 30% needs to be manually examined, but with only 5% of them requiring corrections. We conclude that this prototype is worth implementing for actual appli-
منابع مشابه
Novel algorithm for automated genotyping of microsatellites.
Microsatellites or short tandem repeats (STRs) are abundant in the human genome with easily assayed polymorphisms, providing powerful genetic tools for mapping both Mendelian and complex traits. Microsatellite genotyping requires detection of the products of polymerase chain reaction (PCR) amplification by electrophoresis, and analysis of the peak data for discrimination of the true allele. A h...
متن کاملHigh-throughput mycobacterial interspersed repetitive-unit-variable-number tandem-repeat genotyping for Mycobacterium tuberculosis epidemiological studies.
The emergence of drug-resistant forms of tuberculosis (TB) represents a major public health concern. Understanding the transmission routes of the disease is a key factor for its control and for the implementation of efficient interventions. Mycobacterial interspersed repetitive-unit-variable-number tandem-repeat (MIRU-VNTR) marker typing is a well-described method for lineage identification and...
متن کاملAn Automated Computer System to Support Ultra High Throughput SNP Genotyping
Celera Genomics has constructed an automated computer system to support ultra high-throughput SNP genotyping that satisfies the increasing demand that disease association studies are placing on current genotyping facilities. This system consists of the seamless integration of target SNP selection, automated oligo design, in silico assay quality validation, laboratory management of samples, reag...
متن کاملAutomated SNP Genotype Clustering Algorithm to Improve Data Completeness in High-Throughput SNP Genotyping Datasets from Custom Arrays
High-throughput SNP genotyping platforms use automated genotype calling algorithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was origi...
متن کاملA simple method for automated allele binning in microsatellite markers.
High-throughput fluorescent genotyping requires a considerable amount of automation for accurate and efficient processing of genetic markers. Automated DNA sequencers and corresponding software products are commercially available that contribute substantially to increased throughput rates for large-scale genotyping projects. However, some conceptually simple tasks still require time-consuming m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003